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(57) Abstract: The present invention provides improvements to prior art audio codecs that generate a stereo-illusion through post- 
processing of a received mono signal. These improvements arc accomplished by extraction of stereo-image describing parameters 
at the encoder side, which are transmitted and subsequently used for control of a stereo generator at the decoder side. Furthermore, 
the invention bridges the gap between simple pseudo- stereo methods, and current methods of true stereo-coding, by using a new- 
form of parametric stereo coding. A stereo -balance parameter is introduced, which enables more advanced stereo modes, and in 
addition forms the basis of a new method of stereo-coding of spectral envelopes, of particular use in systems where guided HFR 
(High Frequency Reconstruction) is employed. As a special case, the application of this stereo-coding scheme in scalable HFR-bascd 
codecs is described. 
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EFFICIENT AND SCALABLE PARAMETRIC STEREO CODING FOR LOW 
BITRATE AUDIO CODING APPLICATIONS 

5 

TECHNICAL FIELD 

The present invention relates to low bitrate audio source coding systems. Different parametric 
representations of stereo properties of an input signal are introduced, and the application thereof at the 
decoder side is explained, ranging from pseudo-stereo to full stereo coding of spectral envelopes, the 
1 0 latter of which is especially suited for HFR based codecs. 



BACKGROUND OF THE INVENTION 

Audio source coding techniques can be divided into two classes: natural audio coding and speech coding. 

15 At medium to high bitrates, natural audio coding is commonly used for speech and music signals, and 
stereo transmission and reproduction is possible. In applications where only low bitrates are available, 
e.g. Internet streaming audio targeted at users with slow telephone modem connections, or in the 
emerging digital AM broadcasting systems, mono coding of the audio program material is unavoidable. 
However, a stereo impression is still desirable, in particular when listening with headphones, in which 

20 case a pure mono signal is perceived as originating from "within the head", which can be an unpleasant 
experience. 

One approach to address this problem is to synthesize a stereo signal at the decoder side from a received 
pure mono signal. Throughout the years, several different "pseudo-stereo" generators have been 

25 proposed. For example in [US patent 5,883,962], enhancement of mono signals by means of adding 

delayed/phase shifted versions of a signal to the unprocessed signal, thereby creating a stereo illusion, is 
described. Hereby the processed signal is added to the original signal for each of the two outputs at equal 
levels but with opposite signs, ensuring that the enhancement signals cancel if the two channels are added 
later on in the signal path. In [PCT WO 98/57436] a similar system is shown, albeit without the above 

30 mono-compatibility of the enhanced signal. Prior art methods have in common that they are applied as 
pure post-processes. In other words, no information on the degree of stereo-width, let alone position in 
the stereo sound stage, is available to the decoder. Thus, the pseudo-stereo signal may or may not have a 
resemblance of the stereo character of the original signal. A particular situation where prior art systems 
fall short, is when the original signal is a pure mono signal, which often is the case for speech recordings. 

35 This mono signal is blindly converted to a synthetic stereo signal at the decoder, which in the speech case 
often causes annoying artifacts, and may reduce the clarity and speech intelligibility. 
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Other prior art systems, aiming at true stereo transmission at low bitrates, typically employ a sum and 
difference coding scheme. Thus, the original left (X) and right (R) signals are converted to a sum signal, 
$= (L ~ R)/2, and a difference signal, D = (L~R)/2 5 and subsequently encoded and transmitted. The 
5 receiver decodes the S and D signals, whereupon the original L/R-signal is recreated through the 

operations L = S + D, and R-S-D. The advantage of this, is that very often a redundancy between L and 
R is at hand, whereby the information in D to be encoded is less, requiring fewer bits, than in S. Clearly, 
the extreme case is a pure mono signal, i.e. L and R are identical. A traditional L/R-codec encodes this 
mono signal twice, whereas a S/D codec detects this redundancy, and the D signal does (ideally) not 

10 require any bits at all. Another extreme is represented by the situation where R = -Z, corresponding to 
"out of phase" signals. Now, the S signal is zero, whereas the D signal computes to Z. Again, the S/D- 
scheme has a clear advantage to standard L/R-coding. However, consider the situation where e.g. R = 0 
during a passage, which was not uncommon in the early days of stereo recordings. Both S and D equal 
Z/2, and the S/D-scheme does not offer any advantage. On the contrary, L/R-coding handles this very 

15 well: The R signal does not require any bits. For this reason, prior art codecs employ adaptive switching 
between those two coding schemes, depending on what method that is most beneficial to use at a given 
moment. The above examples are merely theoretical (except for the dual mono case, which is common in 
speech only programs). Thus, real world stereo program material contains significant amounts of stereo 
information, and even if the above switching is implemented, the resulting bitrate is often still too high 

20 for many applications. Furthermore, as can be seen from the resynthesis relations above, very coarse 
quantization of the D signal in an attempt to further reduce the bitrate is not feasible, since the 
quantization errors translate to non-neglectable level errors in the Z and R signals. 



25 SUMMARY OF THE INVENTION 

The present invention employs detection of signal stereo properties prior to coding and transmission. In 
the simplest form, a detector measures the amount of stereo perspective that is present in the input stereo 
signal. This amount is then transmitted as a stereo width parameter, together with an encoded mono sum 
of the original signal. The receiver decodes the mono signal, and applies the proper amount of stereo- 

30 width, using a pseudo-stereo generator, which is controlled by said parameter. As a special case, a mono 
input signal is signaled as zero stereo width, and correspondingly no stereo synthesis is applied in the 
decoder. According to the invention, useful measures of the stereo-width can be derived e.g. from the 
difference signal or from the cross-correlation of the original left and right channel. The value of such 
computations can be mapped to a small number of states, which are transmitted at an appropriate fixed 

35 rate in time, or on an as-needed basis. The invention also teaches how to filter the synthesized stereo 

components, in order to reduce the risk of unmasking coding artifacts which typically are associated with 
low bitrate coded signals. 
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Alternatively, the overall stereo-balance or localization in the stereo field is detected in the encoder. This 
information, optionally together with the above width-parameter, is efficiently transmitted as a balance- 
parameter, along with the encoded mono signal. Thus, displacements to either side of the sound stage can 
5 be recreated at the decoder, by correspondingly altering the gains of the two output channels. According 
to the invention, this stereo-balance parameter can be derived from the quotient of the left and right signal 
powers. The transmission of both types of parameters requires very few bits compared to full stereo 
coding, whereby the total bitrate demand is kept low. In a more elaborate version of the invention, which 
offers a more accurate parametric stereo depiction, several balance and stereo-width parameters are used, 
10 each one representing separate frequency bands. 

The balance-parameter generalized to a per frequency-band operation, together with a corresponding per 
band operation of a level-parameter, calculated as the sum of the left and right signal powers, enables a 
new, arbitrary detailed, representation of the power spectral density of a stereo signal. A particular benefit 

15 of this representation, in addition to the benefits from stereo redundancy that also 

S/D-systems take advantage of, is that the balance-signal can be quantized with less precision than the 
level ditto, since the quantization error, when converting back to a stereo spectral envelope, causes an 
"error in space", i.e. perceived localization in the stereo panorama, rather than an error in level. 
Analogous to a traditional switched L/R- and S/D-system, the level/balance-scheme can be adaptively 

20 switched off, in favor of a levelL/levelR-signal, which is more efficient when the overall signal is heavily 
offset towards either channel. The above spectral envelope coding scheme can be used whenever an 
efficient coding of power spectral envelopes is required, and can be incorporated as a tool in new stereo 
source codecs. A particularly interesting application is in HFR systems that are guided by information 
about the original signal highband envelope. In such a system, the lowband is coded and decoded by 

25 means of an arbitrary codec, and the highband is regenerated at the decoder using the decoded lowband 
signal and the transmitted highband envelope information [PCT WO 98/57436]. Furthermore, the 
possibility to build a scalable HFR-based stereo codec is offered, by locking the envelope coding to 
level/balance operation. Hereby the level values are fed into the primary bitstream, which, depending on 
the implementation, typically decodes to a mono signal. The balance values are fed into the secondary 

30 bitstream, which in addition to the primary bitstream is available to receivers close to the transmitter, 
taking an IBOC (In-Band On-Channel) digital AM-broadcasting system as an example. When the two 
bitstreams are combined, the decoder produces a stereo output signal. In addition to the level values, the 
primary bitstream can contain stereo parameters, e.g. a width parameter. Thus, decoding of this bitstream 
alone already yields a stereo output, which is improved when both bitstreams are available. 



35 
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BRIEF DESCRIPTION OF THE DRAWINGS 

The present invention will now be described by way of illustrative examples, not limiting the scope or 
spirit of the invention, with reference to the accompanying drawings, in which: 

5 Fig. 1 illustrates a source coding system containing an encoder enhanced by a parametric stereo encoder 
module, and a decoder enhanced by a parametric stereo decoder module. 
Fig. 2a is a block schematic of a parametric stereo decoder module, 
Fig. 2b is a block schematic of a pseudo-stereo generator with control parameter inputs, 
Fig. 2c is a block schematic of a balance adjuster with control parameter inputs, 
10 Fig. 3 is a block schematic of a parametric stereo decoder module using multiband pseudo-stereo 
generation combined with multiband balance adjustment, 
Fig. 4a is a block schematic of the encoder side of a scalable HFR-based stereo codec, employing 

level/balance-coding of the spectral envelope, 
Fig. 4b is a block schematic of the corresponding decoder side. 

15 

DESCRIPTION OF PREFERRED EMBODIMENTS 

The below-described embodiments are merely illustrative for the principles of the present invention. It is 
understood that modifications and variations of the arrangements and the details described herein will be 
apparent to others skilled in the art. It is the intent therefore, to be limited only by the scope of the 
20 impending patent claims, and not by the specific details presented by way of description and explanation 
of the embodiments herein. For the sake of clarity, all below examples assume two channel systems, but 
apparent to others skilled in the art, the methods can be applied to multichannel systems, such as a 5.1 
system. 

25 Fig. 1 shows how an arbitrary source coding system comprising of an encoder, 107, and a decoder, 115, 
where encoder and decoder operate in monaural mode, can be enhanced by parametric stereo coding 
according to the invention. Let L and R denote the left and right analog input signals, which are fed to an 
AD-converter, 101 . The output from the AD-converter is converted to mono, 105, and the mono signal is 
encoded, 107. In addition, the stereo signal is routed to a parametric stereo encoder, 103, which calculates 

30 one or several stereo parameters to be described below. Those parameters are combined with the encoded 
mono signal by means of a multiplexer, 109, forming a bitstream, 111. The bitstream is stored or 
transmitted, and subsequently extracted at the decoder side by means of a demultiplexer, 1 13. The mono 
signal is decoded, 1 15, and converted to a stereo signal by a parametric stereo decoder, 119, which uses 
the stereo parameter(s), 117, as control signal(s). Finally, the stereo signal is routed to the DA-converter, 

35 121, which feeds the analog outputs, U and R\ The topology according to Fig.l is common to a set of 
parametric stereo coding methods which will be described in detail, starting with the less complex 
versions. 
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One method of parameterization of stereo properties according to the present invention, is to determine 
the original signal stereo-width at the encoder side. A first approximation of the stereo-width is the 
difference signal, D=L-R, since, roughly put, a high degree of similarity between L and R computes to 
a small value of D, and vice versa. A special case is dual mono, where L - R and thus D = 0. Thus, even 
5 this simple algorithm is capable of detecting the type of mono input signal commonly associated with 
news broadcasts, in which case pseudo-stereo is not desired. However, a mono signal that is fed to L and 
R at different levels does not yield a zero D signal, even though the perceived width is zero. Thus, in 
practice more elaborate detectors might be required, employing for example cross-correlation methods. 
One should make sure that the value describing the left-right difference or correlation in some way is 

1 0 normalized with the total signal level, in order to achieve a level independent detector. A problem with 
the aforementioned detector is the case when mono speech is mixed with a much weaker stereo signal e.g. 
stereo noise or background music during speech-to-music/music-to-speech transitions. At the speech 
pauses the detector will then indicate a wide stereo signal. This is solved by normalizing the stereo-width 
value with a signal containing information of previous total energy level e.g., a peak decay signal of the 

1 5 total energy. Furthermore, to prevent the stereo-width detector from being trigged by high frequency 

noise or channel different high frequency distortion, the detector signals should be pre-filtered by a low- 
pass filter, typically with a cutoff frequency somewhere above a voice's second formant, and optionally 
also by a high-pass filter to avoid unbalanced signal-offsets or hum. Regardless of detector type, the 
calculated stereo-width is mapped to a finite set of values, covering the entire range, from mono to wide 

20 stereo. 

Fig 2a gives an example of the contents of the parametric stereo decoder introduced in Fig 1. The block 
denoted 'balance', 211, controlled by parameter B, will be described later, and should be regarded as 
bypassed for now. The block denoted 'width', 205, takes a mono input signal, and synthetically recreates 

25 the impression of stereo width, where the amount of width is controlled by the parameter W. The optional 
parameters S and D will be described later. According to the invention, a subjectively better sound quality 
can often be achieved by incorporating a crossover filter comprising of a low-pass filter, 203, and a high- 
pass filter, 201, in order to keep the low frequency range "tight" and unaffected. Hereby only the output 
from the high-pass filter is routed to the width block. The stereo output from the width block is added to 

30 the mono output from the low-pass filter by means of 207 and 209, forming the stereo output signal. 
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Any prior art pseudo-stereo generator can be used for the width block, such as those mentioned in the 
background section, or a Schroeder-type early reflection simulating unit (multitap delay) or reverberator. 
Fig. 2b gives an example of a pseudo-stereo generator, fed by a mono signal M, The amount of stereo- 
width is determined by the gain of 215, and this gain is a function of the stereo-width parameter, W, The 
5 higher the gain, the wider the stereo-impression, a zero gain corresponds to pure mono reproduction. The 
output from 215 is delayed, 221, and added, 223 and 225, to the two direct signal instances, using 
opposite signs. In order not to significantly alter the overall reproduction level when changing the stereo- 
width, a compensating attenuation of the direct signal can be incorporated, 213. For example, if the gain 
of the delayed signal is G, the gain of the direct signal can be selected as sqrt(l - G 2 ). According to the 

10 invention, a high frequency roll-off can be incorporated in the delay signal path, 217, which helps 

avoiding pseudo-stereo caused unmasking of coding artifacts. Optionally, crossover filter, roll-off filter 
and delay parameters can be sent in the bitstream, offering more possibilities to mimic the stereo 
properties of the original signal, as also shown in Figs. 2a and 2b as the signals X, S and D. If a 
reverberation unit is used for generating a stereo signal, the reverberation decay might sometimes be 

15 unwanted after the very end of a sound. These unwanted reverb-tails can however easily be attenuated or 
completely removed by just altering the gain of the reverb signal. A detector designed for finding sound 
endings can be used for that purpose. If the reverberation unit generates artifacts at some specific signals 
e.g., transients, a detector for those signals can also be used for attenuating the same. 

20 An alternative method of detecting stereo-properties according to the invention, is described as follows. 
Again, let L and R denote the left and right input signals. The corresponding signal powers are then given 
by P L ~ L 2 and P R ~ R 2 . Now, a measure of the stereo-balance can be calculated as the quotient of the two 
signal powers, or more specifically as B = (P L + e)/( P R + e) : where e is an arbitrary, very small number, 
which eliminates division by zero. The balance parameter, B, can be expressed in dB given by the relation 

25 == 101og 10 (5). As an example, the three cases P L = 10P^, P L = P R , and P L = 0AP R correspond to 

balance values of +10 dB, OdB, and -10 dB respectively. Clearly, those values map to the locations "left", 
"center", and "right". Experiments have shown that the span of the balance parameter can be limited to 
for example +/- 40 dB, since those extreme values are already perceived as if the sound originates entirely 
from one of the two loudspeakers or headphone drivers. This limitation reduces the signal space to cover 

30 in the transmission, thus offering bitrate reduction. Furthermore, a progressive quantization scheme can 
be used, whereby smaller quantization steps are used around zero, and larger steps towards the outer 
limits, which further reduces the bitrate. Often the balance is constant over time for extended passages. 
Thus, a last step to significantly reduce the number of average bits needed can be taken: After 
transmission of an initial balance value, only the differences between consecutive balance values are 

35 transmitted, whereby entropy coding is employed. Very commonly, this difference is zero, which thus is 
signaled by the shortest possible codeword. Clearly, in applications where bit errors are possible, this 
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delta coding must be reset at an appropriate time interval, in order to eliminate uncontrolled error 
propagation. 

The most rudimental decoder usage of the balance parameter, is simply to offset the mono signal towards 
5 either of the two reproduction channels, by feeding the mono signal to both outputs and adjusting the 
gains correspondingly, as illustrated in Fig. 2c, blocks 227 and 229, with the control signal B. This is 
analogous to turning the "panorama" knob on a mixing desk, synthetically "moving" a mono signal 
between the two stereo speakers. 

10 The balance parameter can be sent in addition to the above described width parameter, offering the 
possibility to both position and spread the sound image in the sound-stage in a controlled manner, 
offering flexibility when mimicking the original stereo impression. One problem with combining pseudo 
stereo generation, as mentioned in a previous section, and parameter controlled balance, is unwanted 
signal contribution from the pseudo stereo generator at balance positions far from center position. This is 

15 solved by applying a mono favoring function on the stereo-width value, resulting in a greater attenuation 
of the stereo-width value at balance positions at extreme side position and less or no attenuation at 
balance positions close to the center position. 

The methods described so far, are intended for very low bitrate applications. In applications where higher 

20 bitrates are available, it is possible to use more elaborate versions of the above width and balance 

methods. Stereo-width detection can be made in several frequency bands, resulting in individual stereo- 
width values for each frequency band. Similarly, balance calculation can operate in a multiband fashion, 
which is equivalent to applying different filter-curves to two channels that are fed by a mono signal. 
Fig. 3 shows an example of a parametric stereo decoder using a set of N pseudo-stereo generators 

25 according to Fig. 2b, represented by blocks 307, 317 and 327, combined with multiband balance 

adjustment, represented by blocks 309, 3 19 and 329, as described in Fig. 2c. The individual passbands are 
obtained by feeding the mono input signal, M, to a set of bandpass filters, 305, 315 and 325. The 
bandpass stereo outputs from the balance adjusters are added, 311, 321, 313, 323, forming the stereo 
output signal, L and R. The formerly scalar width- and balance parameters are now replaced by the arrays 

30 W(k) and B(k) . hi Fig. 3, every pseudo-stereo generator and balance adjuster has unique stereo 

parameters. However, in order to reduce the total amount of data to be transmitted or stored, parameters 
from several frequency bands can be averaged in groups at the encoder, and this smaller number of 
parameters be mapped to the corresponding groups of width and balance blocks at the decoder. Clearly, 
different grouping schemes and lengths can be used for the arrays W(k) and B(k). S(k) represents the gains 

35 of the delay signal paths in the width blocks, and D(k) represents the delay parameters. Again, S(k) and 
D(k) are optional in the bitstream. 



WO 03/007656 



PCT/SE02/01372 



The parametric balance coding method can, especially for lower frequency bands, give a somewhat 
unstable behavior, due to lack of frequency resolution, or due to too many sound events occurring in one 
frequency band at the same time but at different balance positions. Those balance-glitches are usually 
characterized by a deviant balance value during just a short period of time, typically one or a few 
5 consecutive values calculated, dependent on the update rate. In order to avoid disturbing balance-glitches, 
a stabilization process can be applied on the balance data. This process may use a number of balance 
values before and after current time position, to calculate the median value of those. The median value 
can subsequently be used as a limiter value for the current balance value i.e., the current balance value 
should not be allowed to go beyond the median value. The current value is then limited by the range 
1 0 between the last value and the median value. Optionally, the current balance value can be allowed to pass 
the limited values by a certain overshoot factor. Furthermore, the overshoot factor, as well as the number 
of balance values used for calculating the median, should be seen as frequency dependent properties and 
hence be individual for each frequency band. 

15 At low update ratios of the balance information, the lack of time resolution can cause failure in 

synchronization between motions of the stereo image and the actual sound events. To improve this 
behavior in terms of synchronization, an interpolation scheme based on identifying sound events can be 
used. Interpolation here refers to interpolations between two, in time consecutive balance values. By 
studying the mono signal at the receiver side, information about beginnings and ends of different sound 

20 events can be obtained. One way is to detect a sudden increase or decrease of signal energy in a particular 
frequency band. The interpolation should after guidance from that energy envelope in time make sure that 
the changes in balance position should be performed preferably during time segments containing little 
signal energy. Since human ear is more sensitive to entries than trailing parts of a sound, the interpolation 
scheme benefits from finding the beginning of a sound by e.g., applying peak-hold to the energy and then 

25 let the balance value increments be a function of the peak-holded energy, where a small energy value 

gives a large increment and vice versa. For time segments containing uniformly distributed energy in time 
i.e., as for some stationary signals, this interpolation method equals linear interpolation between the two 
balance values. If the balance values are quotients of left and right energies, logarithmic balance values 
are preferred, for left - right symmetry reasons. Another advantage of applying the whole interpolation 

30 algorithm in the logarithmic domain is the human ear's tendency of relating levels to a logaritlimic scale. 

Also, for low update ratios of the stereo-width gain values, interpolation can be applied to the same. A 
simple way is to interpolate linearly between two in time consecutive stereo-width values. More stable 
behavior of the stereo-width can be achieved by smoothing the stereo-width gain values over a longer 
35 time segment containing several stereo-width parameters. By utilizing smoothing with different attack 
and release time constants, a system well suited for program material containing mixed or interleaved 
speech and music is achieved. An appropriate design of such smoothing filter is made using a short attack 
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time constant, to get a short rise-time and hence an immediate response to music entries in stereo, and a 
long release time, to get a long fall-time. To be able to fast switch from a wide stereo mode to mono, 
which can be desirable for sudden speech entries, there is a possibility to bypass or reset the smoothing 
filter by signaling this event. Furthermore, attack time constants, release time constants and other 
5 smoothing filter characteristics can also be signaled by an encoder. 

For signals containing masked distortion from a psycho-acoustical codec, one common problem with 
introducing stereo information based on the coded mono signal is an unmasking effect of the distortion. 
This phenomenon usually referred as "stereo-unmasking" is the result of non-centered sounds that do not 
10 fulfill the masking criterion. The problem with stereo-unmasking might be solved or partly solved by, at 
the decoder side, introducing a detector aimed for such situations. Known technologies for measuring 
signal to mask ratios can be used to detect potential stereo-unmasking. Once detected, it can be explicitly 
signaled or the stereo parameters can just simply be decreased. 

1 5 At the encoder side, one option, as taught by the invention, is to employ a Hilbert transformer to the input 
signal, i.e. a 90 degree phase shift between the two channels is introduced. When subsequently forming 
the mono signal by addition of the two signals, a better balance between a center-panned mono signal and 
"true" stereo signals is achieved, since the Hilbert transformation introduces a 

3 dB attenuation for center information. In practice, this improves mono coding of e.g. contemporary pop 
20 music, where for instance the lead vocals and the bass guitar commonly is recorded using a single mono 
source. 

The multiband balance-parameter method is not limited to the type of application described in Fig. 1 . It 
can be advantageously used whenever the objective is to efficiently encode the power spectral envelope 

25 of a stereo signal. Thus, it can be used as tool in stereo codecs, where in addition to the stereo spectral 
envelope a corresponding stereo residual is coded. Let the total power P, be defined by P = P L + P R , 
where P L and P R are signal powers as described above. Note that this definition does not take left to right 
phase relations into account. (E.g. identical left and right signals but of opposite signs, does not yield a 
zero total power.) Analogous to Z?, P can be expressed in dB as Pan - 101ogio(P/ZV), where P re /is an 

30 arbitrary reference power, and the delta values be entropy coded. As opposed to the balance case, no 

progressive quantization is employed for P. In order to represent the spectral envelope of a stereo signal, 
P and B are calculated for a set of frequency bands, typically, but not necessarily, with bandwidths that 
are related to the critical bands of human hearing. For example those bands may be formed by grouping 
of channels in a constant bandwidth filterbank, whereby P L and P R are calculated as the time and 

35 frequency averages of the squares of the subband samples corresponding to respective band and period in 
time. The sets P 0 , Pi, P 2 , ftw an <i ^o, B u B 2 , B NA , where the subscripts denote the frequency band 
in an N band representation, are delta and Huffman coded, transmitted or stored, and finally decoded into 
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the quantized values that were calculated in the encoder. The last step is to convert P and B back to P L 
and P R . As easily seen form the definitions of P and B, the reverse relations are (when neglecting e in the 
definition of B) P L = BPf(B + 1), and P R = P/(B + 1). 

5 One particularly interesting application of the above envelope coding method is coding of highband 

spectral envelopes for HFR-based codecs. In this case no highband residual signal is transmitted. Instead 
this residual is derived from the lowband. Thus, there is no strict relation between residual and envelope 
representation, and envelope quantization is more crucial. In order to study the effects of quantization, let 
Pq and Bq denote the quantized values of P and B respectively. Pq and Bq are then inserted into the 

10 above relations, and the sum is formed: 

Piq + P R q = BqPql(Bq + 1) + Pq/(Bq + 1) = Pq(Bq +.l)/(Bq + 1) = Pq. The interesting feature here is 
that Bq is eliminated, and the error in total power is solely determined by the quantization error in P. This 
implies that even though B is heavily quantized, the perceived level is correct, assuming that sufficient 
precision in the quantization of P is used. In other words, distortion in B maps to distortion in space, 

15 rather than in level. As long as the sound sources are stationary in the space over time, this distortion in 
the stereo perspective is also stationary, and hard to notice. As already stated, the quantization of the 
stereo-balance can also be coarser towards the outer extremes, since a given error in dB corresponds to a 
smaller error in perceived angle when the angle to the centerline is large, due to properties of human 
hearing. 

20 

When quantizing frequency dependent data e.g., multi band stereo-width gain values or multi band 
balance values, resolution and range of the quantization method can advantageously be selected to match 
the properties of a perceptual scale. If such scale is made frequency dependent, different quantization 
methods, or so called quantization classes, can be chosen for the different frequency bands. The encoded 
25 parameter values representing the different frequency bands, should then in some cases, even if having 
identical values, be interpreted in different ways i.e., be decoded into different values. 

Analogous to a switched L/R- to S/D-coding scheme, the P and B signals may be adaptively substituted 
by the P L and P R signals, in order to better cope with extreme signals. As taught by [PCT/SE00/00158], 

30 delta coding of envelope samples can be switched from delta-in-time to delta-in-frequency, depending on 
what direction is most efficient in terms of number of bits at a particular moment. The balance parameter 
can also take advantage of this scheme: Consider for example a source that moves in stereo field over 
time. Clearly, this corresponds to a successive change of balance values over time, which depending on 
the speed of the source versus the update rate of the parameters, may correspond to large delta-in-time 

35 values, corresponding to large codewords when employing entropy coding. However, assuming that the 
source has uniform sound radiation versus frequency, the delta-in-frequency values of the balance 
parameter are zero at every point in time, again corresponding to small codewords. Thus, a lower bitrate 
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is achieved in this case, when using the frequency delta coding direction. Another example is a source 
that is stationary in the room, but has a non-uniform radiation. Now the delta-in-frequency values are 
large, and delta-in-time is the preferred choice. 

5 The P/B-coding scheme offers the possibility to build a scalable HFR-codec, see Fig. 4. A scalable codec 
is characterized in that the bitstream is split into two or more parts, where the reception and decoding of 
higher order parts is optional. The example assumes two bitstream parts, hereinafter referred to as 
primary, 419, and secondary, 417,, but extension to a higher number of parts is clearly possible. The 
encoder side, Fig. 4a, comprises of an arbitrary stereo lowband encoder, 403, which operates on the stereo 

10 input signal, IN (the trivial steps of AD- respective DA-conversion are not shown in the figure), a 

parametric stereo encoder, which estimates the highband spectral envelope, and optionally additional 
stereo parameters, 401, which also operates on the stereo input signal, and two multiplexers, 415 and 413, 
for the primary and secondary bitstreams respectively. In this application, the highband envelope coding 
is locked to P/B-operation, and the P signal, 407, is sent to the primary bitstream by means of 415, 

15 whereas the B signal, 405, is sent to the secondary bitstream, by means of 413. 

For the lowband codec different possibilities exist: It may constantly operate in S/D-mode, and the S and 
D signals be sent to primary and secondary bitstreams respectively. In this case, a decoding of the 
primary bitstream results in a full band mono signal. Of course, this mono signal can be enhanced by 

20 parametric stereo methods according to the invention, in which case the stereo-parameter(s) also must be 
located in the primary bitstream. Another possibility is to feed a stereo coded lowband signal to the 
primary bitstream, optionally together with highband width- and balance-parameters. Now decoding of 
the primary bitstream results in true stereo for the lowband, and very realistic pseudo-stereo for the 
highband, since the stereo properties of the lowband are reflected in the high frequency reconstruction. 

25 Stated in another way: Even though the available highband envelope representation or spectral coarse 
structure is in mono, the synthesized highband residual or spectral fine structure is not. In this type of 
implementation, the secondary bitstream may contain more lowband information, which when combined 
with that of the primary bitstream, yields a higher quality lowband reproduction. The topology of Fig. 4 
illustrates both cases, since the primary and secondary lowband encoder output signals, 41 1, and 409, 

30 connected to 415 and 417 respectively, may contain either of the above described signal types. 

The bitstreams are transmitted or stored, and either only 419 or both 419 and 417 are fed to the decoder, 
Fig. 4b. The primary bitstream is demultiplexed by 423, into the lowband core decoder primary signal, 
429 and the P signal, 43 1. Similarly, the secondary bitstream is demultiplexed by 421, into the lowband 
35 core decoder secondary signal, 427, and the B signal, 425. The lowband signal(s) is(are) routed to the 
lowband decoder, 433, which produces an output, 435, which again, in case of decoding of the primary 
bitstream only, may be of either type described above (mono or stereo). The signal 435 feeds the HFR- 
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unit, 437, wherein a synthetic highband is generated, and adjusted according to P, which also is connected 
to the HFR-unit The decoded lowband is combined with the highband in the HFR-unit, and the lowband 
and/or highband is optionally enhanced by a pseudo-stereo generator (also situated in the HFR-iinit), 
before finally being fed to the system outputs, forming the output signal, OUT. When the secondary 
5 bitstream, 417, is present, the HFR-unit also gets the B signal as an input signal, 425, and 435 is in stereo, 
whereby the system produces a full stereo output signal, and pseudo-stereo generators if any, are 
bypassed. 
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1. A method for coding of stereo properties of an input signal, characterised by: 

at an encoder, calculate a width-parameter that signals a stereo-width of said input signal, and 
5 at a decoder, generate a stereo output signal, using said width-parameter to control a stereo-width 

of said output signal. 

2. A method according to claim 1, characterised by: 

at said encoder, form a mono signal from said input signal, and 
10 at said decoder, said generation implies a pseudo-stereo method operating on said mono signal. 

3. A method according to claim 2, characterised in that said pseudo-stereo method implies splitting of 
said mono signal into two signals as well as addition of delayed version(s) of said mono signal to said two 
signals, at level(s) controlled by said width-parameter. 

15 

4. A method according to claim 3, characterised in that said delayed version(s) are high-pass filtered 
and progressively attenuated at higher frequencies prior to being added to said two signals. 

5. A method according to claim 1, characterised in that said width-parameter is a vector, and the 
20 elements of said vector correspond to separate frequency bands. 

6. A method according to claims 1 - 5, characterised in that if said input signal is of type dual mono, 
said output signal is also of type dual mono. 

25 7. A method for coding of stereo properties of an input signal, characterised by: 

at an encoder, calculate a balance-parameter that signals a stereo-balance of said input signal, and 
at a decoder, generate a stereo output signal, using said balance-parameter to control a stereo- 
balance of said output signal. 

30 8. A method according to claim 7, characterised by: 

at said encoder, form a mono signal from said input signal, and 

at said decoder, said generation implies splitting of said mono signal into two signals, and said 
control implies adjustment of levels of said two signals. 

35 9. A method according to claim 7, characterised in that a power for each channel of said input signal is 
calculated, and said balance-parameter is calculated from a quotient between said powers. 
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10. A method according to claim 9, characterised in that said powers and said balance-parameter are 
vectors where every element corresponds to a specific frequency band. 

11. A method according to claim 7, characterised in at said decoder interpolating between two in time 
5 consequtive values of said balance-parameters in a way that the momentary value of the corresponding 

power of said mono signal controls how steep the momentary interpolation should be. 

12. A method according to claim 11, characterised in that said interpolation method is performed on 
balance values represented as logarithmic values. 

10 

13. A method according to claim 7, characterised in that said values of balance-parameters are limited to 
a range between a previous balance value, and a balance value extracted from other balance values by a 
median filter or other filter process, where said range can be further extended by moving the borders of 
said range by a certain factor. 

15 

14. A method according to claim 13, characterised in that said method of extracting limiting borders for 
balance values, is, for a multiband system, frequency dependent. 

15. A method according to claim 10, characterised in that an additional level-parameter is calculated as 
20 a vector sum of said powers and sent to said decoder, thereby providing said decoder a representation of a 

spectral envelope of said input signal. 

16. A method according to claim 15, characterised in that said level-parameter and said balance- 
parameter adaptively are replaced by said powers. 

25 

17. A method according to claim 16, characterised in that said spectral envelope is used to control a 
HFR-process in a decoder. 

18. A method according to claim 15, characterised in that said level-parameter is fed into a primary 
30 bitstream of a scalable HFR-based stereo codec, and said balance-parameter is fed into a secondary 

bitstream of said codec. 



19. A method according to claims 2 and 18, characterised in that said mono signal and said width- 
parameter are fed into said primary bitstream. 



35 
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15 

20. A method according to claims 5 and 16, characterised in that said width-parameters are processed by 
a function that gives smaller values for a balance value that corresponds to a balance position further from 
the center position. 

5 21. A method according to any of claims 7 - 18, characterised in that a quantization of said balance- 
parameter employs smaller quantization steps around a center position and larger steps towards outer 
positions. 

22. A method according to claims 5 and 21, characterised in that said width-parameters and said 

10 balance-parameters are quantized using a quantization method in terms of resolution and range which, for 
a multiband system, is frequency dependent. 

23. A method according to any of claims 10-18, characterised in that said balance-parameter 
adaptively is delta-coded either in time or in frequency. 

15 

24. A method according to any of claims 2 and 8, characterised in that said input signal is passed 
though a Hilbert transformer prior to forming said mono signal. 

25. An apparatus for parametric stereo coding, characterised by: 

20 at an encoder, means for calculation of a width-parameter that signals a stereo-width of an input 

signal, and means for forming a mono signal from said input signal, 

at a decoder, means for generating a stereo output signal from said mono signal, using said width- 
parameter to control a stereo-width of said output signal. 
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